345 research outputs found

    Enhanced suffix arrays as language models: Virtual k-testable languages

    Get PDF
    In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited size n. This approach, which is used with synchronous back-off, allows us to distinguish between alternative sequences using large contexts. We also show that we can build this kind of models with additional information for each symbol, such as part-of-speech tags and dependency information. The approach can also be viewed as a collection of virtual k-testable automata. Once built, we can directly access the results of any k-testable automaton generated from the input training data. Synchronous back- off automatically identies the k-testable automaton with the largest feasible k. We have used this approach in several classification tasks

    Finding patterns in strings using suffix arrays

    Get PDF
    Finding regularities in large data sets requires implementations of systems that are efficient in both time and space requirements. Here, we describe a newly developed system that exploits the internal structure of the enhanced suffixarray to find significant patterns in a large collection of sequences. The system searches exhaustively for all significantly compressing patterns where patterns may consist of symbols and skips or wildcards. We demonstrate a possible application of the system by detecting interesting patterns in a Dutch and an English corpus

    Enhanced suffix arrays as language models:Virtual k-testable languages

    Get PDF

    Rademacher complexity and grammar induction algorithms:What it may (not) tell us

    Get PDF

    Token merging in language model-based confusible disambiguation

    No full text
    In the context of confusible disambiguation (spelling correction that requires context), the synchronous back-off strategy combined with traditional n-gram language models performs well. However, when alternatives consist of a different number of tokens, this classification technique cannot be applied directly, because the computation of the probabilities is skewed. Previous work already showed that probabilities based on different order n-grams should not be compared directly. In this article, we propose new probability metrics in which the size of the n is varied according to the number of tokens of the confusible alternative. This requires access to n-grams of variable length. Results show that the synchronous back-off method is extremely robust. We discuss the use of suffix trees as a technique to store variable length n-gram information efficiently

    The uncanny valley of a virtual animal.

    Get PDF
    Virtual robots, including virtual animals, are expected to play a major role within affective and aesthetic interfaces, serious games, video instruction, and the personalization of educational instruction. Their actual impact, however, will very much depend on user perception of virtual characters as the uncanny valley hypothesis has shown that the design of virtual characters determines user experiences. In this article, we investigated whether the uncanny valley effect, which has already been found for the human-like appearance of virtual characters, can also be found for animal-like appearances. We conducted an online study (N = 163) in which six different animal designs were evaluated in terms of the following properties: familiarity, commonality, naturalness, attractiveness, interestingness, and animateness. The study participants differed in age (under 10–60 years) and origin (Europe, Asia, North America, and South America). For the evaluation of the results, we ranked the animal-likeness of the character using both expert opinion and participant judgments. Next to that, we investigated the effect of movement and morbidity. The results confirm the existence of the uncanny valley effect for virtual animals, especially with respect to familiarity and commonality, for both still and moving images. The effect was particularly pronounced for morbid images. For naturalness and attractiveness, the effect was only present in the expert-based ranking, but not in the participant-based ranking. No uncanny valley effect was detected for interestingness and animateness. This investigation revealed that the appearance of virtual animals directly affects user perception and thus, presumably, impacts user experience when used in applied settings

    The gas of elastic quantum strings in 2+1 dimensions: finite temperatures

    Get PDF
    The finite temperature physics of the gas of elastic quantum strings as introduced in J. Zaanen, Phys. Rev. Lett. 84, 753 is investigated. This model is inspired on the stripes in the high Tc superconductors. We analyze in detail how the kinetic interactions of the zero temperature quantum problem crossover into the entropic interactions of the high temperature limit.Comment: 14 pages, 2 figure

    Quantizing Charged Magnetic Domain Walls: Strings on a Lattice

    Get PDF
    The discovery by Tranquada et al. of an ordered phase of charged domain walls in the high-Tc cuprates leads us to consider the possible existence of a quantum domain-wall liquid. We propose minimal models for the quantization, by meandering fluctuations, of isolated charged domain walls. These correspond to lattice string models. The simplest model of this kind, a directed lattice string, can be mapped onto a quantum spin chain or on a classical two-dimensional solid-on-solid surface model. The model exhibits a rich phase diagram, containing several rough phases with low-lying excitations as well as ordered phases which are gapped.Comment: 4 two-column pages, including the 3 Postscript figure
    corecore